The novel Coronavirus became a global pandemic in early 2020 and governments worldwide were tasked with understanding how to emerge from the pandemic. Understandably, there was a large focus on vaccines and this study will investigate the effectiveness of these vaccines with regards to reducing severity of COVID and the transmission of cases. The objective of this study is to better understand how, throughout 2021, the vaccines effected the severity of COVID and the transmission of cases. The data was collected by Our World in Data in partnership with Oxford University and covers countries from each continent around the world. The responsibility of the reporting of the data fell to the individual nations.
Analyse the following hypothesis:
H1 - An increase in the number of people vaccinated will reduce the number of deaths
H2 - An increase in the number of people vaccinated will reduce the severity of COVID
H3 - An increase in the number of people vaccinated will not effect the transmission of cases.
Independent variables (Standardized x hundred)
Dependent variables (Standardized x hundred)
Latent/control variables (Standardized x hundred)
n
[1,] "CONT" "continent"
[2,] "COUN" "location"
[3,] "CASES.T" "total_cases_per_million"
[4,] "CASES.N" "new_cases_per_million"
[5,] "DEATHS.T" "total_deaths_per_million"
[6,] "DEATHS.N" "new_deaths_per_million"
[7,] "ICU" "icu_patients_per_million"
[8,] "HOSP" "hosp_patients_per_million"
[9,] "VAC.T" "total_vaccinations"
[10,] "PEOPLE.V" "people_vaccinated"
[11,] "VAC.p" "total_vaccinations_per_hundred"
[12,] "PEOPLE.V.p" "people_vaccinated_per_hundred"
[13,] "POP" "population"
[14,] "GDP.PC" "gdp_per_capita"
[15,] "HEART.R" "cardiovasc_death_rate"
[16,] "BEDS" "hospital_beds_per_thousand"
[17,] "HDI" "human_development_index"
[18,] "DATE" "date.1"
Firstly, the vaccinations appear to have little effect on the number of cases. With every increase in vaccination per 100 people, there is an increase of 0.55 cases. This suggests that the presence of a vaccine has encouraged societies to take less care and fast-tracked governments relaxation of the rules.
Secondly, the reduction of ICU patients when vaccinations per hundred people increases is clear. With every increase in vaccination per 100 people there is a 0.24 decrease in the number of ICU patients per million. This shows the vaccine is reducing the severity of COVID much more than the transmission of cases.
Furthermore, the number of new deaths per million decreases with an increase in the number of vaccinations per 100 people by 0.08. Strengthening the idea that severity of COVID and, as a bi-product, deaths will be reduce as vaccine take up increased.
From this we can settle on the following hypothesis:
H1 - An increase in the number of people vaccinated will reduce the number of deaths
H2 - An increase in the number of people vaccinated will reduce the severity of COVID
H3 - An increase in the number of people vaccinated will not effect the transmission of cases.
The above graph explains how the number of deaths acted across the period between May 2020 and January 2022 in Spain. It should be made clear that this is the number of New Deaths, not accumulative deaths, which explains the volatile nature of the data.
Call:
lm(formula = d$DEATHS.N ~ d$CASES.N + d$PEOPLE.V.p)
Residuals:
Min 1Q Median 3Q Max
-11.525 0.011 1.194 2.031 14.468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.454575 0.032097 -107.63 <2e-16 ***
d$CASES.N 0.606850 0.004508 134.62 <2e-16 ***
d$PEOPLE.V.p -0.201922 0.008634 -23.39 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.528 on 40961 degrees of freedom
(122595 observations deleted due to missingness)
Multiple R-squared: 0.3071, Adjusted R-squared: 0.3071
F-statistic: 9079 on 2 and 40961 DF, p-value: < 2.2e-16
Call:
lm(formula = d$DEATHS.N ~ d$PEOPLE.V.p + factor(d$CONT))
Residuals:
Min 1Q Median 3Q Max
-10.068 -1.806 1.370 2.878 11.842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.10401 0.06087 1.709 0.0875 .
d$PEOPLE.V.p -0.10618 0.01016 -10.452 < 2e-16 ***
factor(d$CONT)Africa -3.87812 0.08481 -45.729 < 2e-16 ***
factor(d$CONT)Asia -2.38576 0.07030 -33.939 < 2e-16 ***
factor(d$CONT)Europe -1.33249 0.06743 -19.762 < 2e-16 ***
factor(d$CONT)North America -2.18187 0.08074 -27.025 < 2e-16 ***
factor(d$CONT)Oceania -6.29534 0.14311 -43.989 < 2e-16 ***
factor(d$CONT)South America 0.43593 0.08879 4.910 9.16e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.011 on 40989 degrees of freedom
(122562 observations deleted due to missingness)
Multiple R-squared: 0.1045, Adjusted R-squared: 0.1044
F-statistic: 683.4 on 7 and 40989 DF, p-value: < 2.2e-16
Based on the above regression, where deaths are standardized per million, we have evidence to support hypothesis one. It would be expected that an increase in the number of cases would explain an increase in the number of deaths. However, as the number of vaccinations per hundred increases, it would be expected that the number of deaths should decrease. Here, we can interpret that for every increase in vaccinations per hundred a country gives, there will be a decrease of 0.023 new deaths per million, supporting the notion that an increase in the number of vaccinations there are will represent a decrease in the number of deaths.
We can also understand further that the effect of new deaths is relatively comparable across all CONTs, which leads us to believe there is less merit in exploring this relationship.
Call:
lm(formula = ES$DEATHS.N ~ ES$CASES.N + ES$HOSP + ES$ICU + ES$PEOPLE.V.p)
Residuals:
Min 1Q Median 3Q Max
-9.5238 -0.3988 0.1446 0.4841 1.6498
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.414659 0.524614 -4.603 6.32e-06 ***
ES$CASES.N 0.653489 0.009143 71.475 < 2e-16 ***
ES$HOSP -0.872072 0.185441 -4.703 4.03e-06 ***
ES$ICU 1.277205 0.258540 4.940 1.34e-06 ***
ES$PEOPLE.V.p -0.226833 0.056441 -4.019 7.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9231 on 281 degrees of freedom
(466 observations deleted due to missingness)
Multiple R-squared: 0.9513, Adjusted R-squared: 0.9506
F-statistic: 1371 on 4 and 281 DF, p-value: < 2.2e-16
The regression above makes interesting reading. The number of people vaccinated per hundred in Spain is an extension of the previous analysis we witnessed in the Pairs Panel above, as vaccinations increase the number of deaths decrease. Interestingly, the total number of vaccinations per hundred people has increased the number of ICU patients per million. It is likely that this is a reflection on the booster doses that have been issued because they are unlikely to reduce the number of ICU patients per million, but instead maintain the level that the initial rounds of vaccine achieved.
par(mfrow = c(1,2))
plot(ES$ICU ~ ES$PEOPLE.V.p + ES$HOSP)
chisq df pvalue rmsea
2333.0385 7.0000 0.0000 1.0779
chisq df pvalue rmsea
626.0573 4.0000 0.0000 0.7374
As previously explained, there is a negative correlation between people vaccinated per hundred and the total number of ICU patients. This is best depicted in the graphs above. At the point of 0 vaccinations per hundred, there is close to 100 ICU patients per million. It should be noted that there were reports of Spanish hospitals being at capacity throughout the pandemic which suggests, without capacity constraints, these numbers may have been higher. As the number of people vaccinated increased, the number of ICU patients decreased.
The graphs remain relatively volatile throughout the range of number of people vaccinated per hundred which can be interpreted to represent the evolving nature of the virus. The number of people vaccinated could be used as a proxy for time, as this number is only capable of increasing. Therefore the peaks on the graphs above are likely to represent the waves of the pandemic.
For this model 4 latent variables were defined, Cases, Vaccination, Hospitalization and Deaths, all of them with measure indicators of the daily average per million on each quarter of: cases, vaccines, hospitalizations and deaths.
The bivariate analysis shows correlations among the variables, initially there is not multidisciplinary detected meaning there is variation between variables and the periods.
[1] "Country" "GDPQ1" "HDIQ1" "LifeExpQ1" "BedsQ1"
[6] "UCIQ1" "HrthAttcksQ1" "DiabetesQ1" "Deaths.NQ1" "People.VQ1"
[11] "HospQ1" "Cases.NQ1" "GDPQ2" "HDIQ2" "LifeExpQ2"
[16] "BedsQ2" "UCIQ2" "HrthAttcksQ2" "DiabetesQ2" "Deaths.NQ2"
[21] "People.VQ2" "HospQ2" "Cases.NQ2" "GDPQ3" "HDIQ3"
[26] "LifeExpQ3" "BedsQ3" "UCIQ3" "HrthAttcksQ3" "DiabetesQ3"
[31] "Deaths.NQ3" "People.VQ3" "HospQ3" "Cases.NQ3" "GDPQ4"
[36] "HDIQ4" "LifeExpQ4" "BedsQ4" "UCIQ4" "HrthAttcksQ4"
[41] "DiabetesQ4" "Deaths.NQ4" "People.VQ4" "HospQ4" "Cases.NQ4"
chisq df pvalue rmsea
683.6484 111.0000 0.0000 0.3839
lavaan 0.6-10 ended normally after 308 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 25
Used Total
Number of observations 35 238
Model Test User Model:
Test statistic 683.648
Degrees of freedom 111
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
Ca =~
Cases.NQ1 1.000
Cases.NQ2 1.000
Cases.NQ3 1.000
Cases.NQ4 1.000
Hos =~
HospQ1 1.000
HospQ2 1.000
HospQ3 1.000
HospQ4 1.000
Dea =~
Deaths.NQ1 1.000
Deaths.NQ2 1.000
Deaths.NQ3 1.000
Deaths.NQ4 1.000
Vac =~
People.VQ1 1.000
People.VQ2 1.000
People.VQ3 1.000
People.VQ4 1.000
Regressions:
Estimate Std.Err z-value P(>|z|)
Dea ~
Ca 0.001 0.004 0.312 0.755
Hos -0.024 1.846 -0.013 0.989
Vac -2.231 116.084 -0.019 0.985
Hos ~
Vac -63.462 35.089 -1.809 0.071
Covariances:
Estimate Std.Err z-value P(>|z|)
Ca ~~
Vac -82.771 58.901 -1.405 0.160
Variances:
Estimate Std.Err z-value P(>|z|)
.Cases.NQ1 34767.876 8681.043 4.005 0.000
.Cases.NQ2 3439.116 1749.604 1.966 0.049
.Cases.NQ3 20599.345 5310.638 3.879 0.000
.Cases.NQ4 62065.619 15196.692 4.084 0.000
.HospQ1 13015.753 3425.370 3.800 0.000
.HospQ2 3096.670 1253.704 2.470 0.014
.HospQ3 19230.912 4898.543 3.926 0.000
.HospQ4 8727.017 2421.593 3.604 0.000
.Deaths.NQ1 13.016 3.073 4.235 0.000
.Deaths.NQ2 2.318 0.593 3.906 0.000
.Deaths.NQ3 3.020 0.745 4.054 0.000
.Deaths.NQ4 6.715 1.588 4.230 0.000
.People.VQ1 23.888 6.443 3.708 0.000
.People.VQ2 101.341 24.458 4.143 0.000
.People.VQ3 134.993 32.463 4.158 0.000
.People.VQ4 118.692 28.583 4.153 0.000
Ca 4447.226 1961.749 2.267 0.023
.Hos -226.655 11878.744 -0.019 0.985
.Dea -0.136 14.860 -0.009 0.993
Vac 5.998 4.543 1.320 0.187
In this model we see strong correlations between the latent variables Cases, Vaccinations, deaths and Hospitalization and their respective indicators of each quarter. Furthermore, we see that Vaccination has high negative beta regression coefficient (-3.30) against Deaths. This means that as expected the more vaccinations are executed the less deaths are expected. Also vaccinations have high negative beta regression coefficient (-1) with hospitalizations. Meaning vaccines are helping to decrease severe cases. Another feature is that Cases present a positive beta regression coefficient with deaths, meaning the more cases, more deaths are expected.
Finally this model explains, with a high negative beta regression coefficient, that hospitalization decreases deaths. Principal assumption is that the higher the hospitalizations the less deaths will be produced, therefore countries should make high investments in hospitals to diminish covid severity.
The disturbance Terms for Cases are 0.89, 0.44, 0.82 and 0.93 for Q1,Q2,Q3 & Q4, meaning there 11%,56%, 18% and 7% of the variances of each quarter is caused by variables not controlled. Disturbance are low probably because covid increases similarly among similar countries, therefore there are not many external variables that explain variations in new cases.
The disturbance Terms for Vaccination are 0.8, 0.94, 0.96 and 0.95 for Q1,Q2,Q3 & Q4, meaning there 20%,16%, 4% and 5% of the variances of each quarter is caused by variables not controlled. The low disturbance terms is satisfactory for the model, for instance the amount of vaccination per millions applied for covid should not be caused by many external variables rather than the country. which in this case vaccines were distributed in the world considering the population of each country.
However, Disturbance terms for Hospitalization seems to be higher, this is because not only hospitals capacity is very different from country to country but health investment and life style.
This model presents some issues, the P value very low and the degrees of freedom are high, meaning it is not reliable. Further investigation needs to be done in order to find a more suitable model. Which is why modification indices were driven to find a better fit.
It also important noticing that there are a lot of missing considering that some countries are not very accurate on their reporting abilities.
[1] "Country" "GDPQ1" "HDIQ1" "LifeExpQ1" "BedsQ1"
[6] "UCIQ1" "HrthAttcksQ1" "DiabetesQ1" "Deaths.NQ1" "People.VQ1"
[11] "HospQ1" "Cases.NQ1" "GDPQ2" "HDIQ2" "LifeExpQ2"
[16] "BedsQ2" "UCIQ2" "HrthAttcksQ2" "DiabetesQ2" "Deaths.NQ2"
[21] "People.VQ2" "HospQ2" "Cases.NQ2" "GDPQ3" "HDIQ3"
[26] "LifeExpQ3" "BedsQ3" "UCIQ3" "HrthAttcksQ3" "DiabetesQ3"
[31] "Deaths.NQ3" "People.VQ3" "HospQ3" "Cases.NQ3" "GDPQ4"
[36] "HDIQ4" "LifeExpQ4" "BedsQ4" "UCIQ4" "HrthAttcksQ4"
[41] "DiabetesQ4" "Deaths.NQ4" "People.VQ4" "HospQ4" "Cases.NQ4"
[46] "lc1" "lc2" "lc3" "lc4" "lh1"
[51] "lh2" "lh3" "lh4" "ld1" "ld2"
[56] "ld3" "ld4" "lv1" "lv2" "lv3"
[61] "lv4"
[1] "lc1" "lc2" "lc3" "lc4" "lv1" "lv2" "lv3" "lv4"
[9] "GDPQ1" "BedsQ1"
lavaan 0.6-10 ended normally after 55 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 27
Used Total
Number of observations 237 238
Number of missing patterns 16
Model Test User Model:
Test statistic 61.903
Degrees of freedom 8
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|)
Ca =~
lc1 1.000
lc2 0.995 0.061 16.397 0.000
lc3 1.001 0.089 11.199 0.000
lc4 1.258 0.108 11.621 0.000
Vac =~
lv2 1.000
lv3 1.174 0.066 17.765 0.000
lv4 0.977 0.080 12.231 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
Ca ~
Vac 0.993 0.119 8.366 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
.lc1 ~~
.lc2 0.543 0.172 3.152 0.002
.lc2 ~~
.lc4 -0.414 0.100 -4.162 0.000
.lv2 ~~
.lv4 -0.089 0.050 -1.773 0.076
.lv3 0.068 0.086 0.799 0.425
.lc3 ~~
.lc4 0.090 0.188 0.480 0.631
Intercepts:
Estimate Std.Err z-value P(>|z|)
.lc1 3.498 0.132 26.595 0.000
.lc2 3.429 0.122 28.085 0.000
.lc3 3.890 0.119 32.581 0.000
.lc4 3.858 0.140 27.539 0.000
.lv2 1.694 0.085 20.041 0.000
.lv3 2.724 0.085 31.918 0.000
.lv4 3.334 0.074 44.827 0.000
.Ca 0.000
Vac 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.lc1 1.571 0.213 7.365 0.000
.lc2 1.065 0.182 5.848 0.000
.lc3 0.897 0.176 5.108 0.000
.lc4 0.772 0.256 3.016 0.003
.lv2 0.410 0.128 3.197 0.001
.lv3 0.023 0.044 0.519 0.603
.lv4 0.120 0.032 3.702 0.000
.Ca 1.176 0.193 6.097 0.000
Vac 1.213 0.197 6.156 0.000
chisq df pvalue rmsea
61.9033 8.0000 0.0000 0.1686
lhs op rhs mi epc sepc.lv sepc.all sepc.nox
50 lc3 ~~ lv4 15.502 0.079 0.079 0.239 0.239
45 lc2 ~~ lv2 11.889 -0.113 -0.113 -0.171 -0.171
49 lc3 ~~ lv3 8.695 -0.046 -0.046 -0.322 -0.322
41 lc1 ~~ lv2 6.323 0.086 0.086 0.107 0.107
53 lc4 ~~ lv4 5.349 -0.057 -0.057 -0.188 -0.188
In contrast to previous models, the above model is not effected by the high numbers of missing data. Previously, the reporting of hospitalizations led us to have very high numbers of missings.
The aggregation of missing data, as shown above, reflects the high level of missingness in quarter 1 of 2021. This is centered around the rate of vaccine roll out around the world and therefore has been left out of this model.
The model shows us the effect of vaccination on cases. As previously mentioned, we did not expect the vaccine roll out to reduce the number of cases and thus the transmission of the virus. This is evident here as the estimate of the regression is 0.993. In context, this is likely to be explained by the Omicron variant. Furthermore, as vaccines were rolled out worldwide, governments were more lenient with restrictions, aware that this would increase the number of cases but confident that as the vaccines took hold the severity of COVID would reduce.
With more reliable data, we could implement the aforementioned model in which latent variables were assigned to assess the severity of COVID and prove this hypothesis. Regrettably, without this data, this explanation will have to suffice.
Whilst we still have some modification indices that are high, this is the limit of where we can introduce correlations between variables and still receive an output including standard errors. In the model, some of the correlations have been frozen, meaning they will not be included in the model but show where the desired correlations would be introduced if possible.
lavaan 0.6-10 ended normally after 32 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 16
Used Total
Number of observations 141 238
Model Test User Model:
Test statistic 49.567
Degrees of freedom 14
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
Ca =~
lc1 1.000
lc2 0.797 0.084 9.441 0.000
lc3 0.766 0.097 7.923 0.000
lc4 1.323 0.137 9.684 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
Ca ~
lv2 0.315 0.173 1.825 0.068
lv3 0.260 0.340 0.766 0.443
lv4 -0.057 0.282 -0.203 0.839
lbed 0.502 0.118 4.274 0.000
lgdp 0.238 0.122 1.947 0.051
Covariances:
Estimate Std.Err z-value P(>|z|)
.lc1 ~~
.lc2 0.929 0.183 5.085 0.000
.lc2 ~~
.lc4 -0.144 0.112 -1.290 0.197
.lc3 0.304 0.106 2.857 0.004
Variances:
Estimate Std.Err z-value P(>|z|)
.lc1 1.625 0.218 7.445 0.000
.lc2 1.614 0.223 7.222 0.000
.lc3 0.920 0.124 7.406 0.000
.lc4 0.542 0.157 3.447 0.001
.Ca 0.489 0.120 4.064 0.000
chisq df pvalue rmsea
49.5674 14.0000 0.0000 0.1342
lhs op rhs mi epc sepc.all delta ncp power decision
34 lc1 ~~ lc4 1.266 -0.203 -0.217 0.1 0.306 0.086 (i)
35 lc3 ~~ lc4 0.672 0.114 0.161 0.1 0.519 0.111 (i)
33 lc1 ~~ lc3 0.129 0.044 0.036 0.1 0.672 0.130 (i)
Based on the above model, we can see the effect that the GDP of the country and the number of hospital beds per thousand has on the number of cases a country has. Focusing on the Path Diagram, we can see that there is a positive effect of both of these variables on the number of cases. Number of Hospital Beds and GDP act as proxy’s for the level of development of the country, suggesting that countries that are more developed had more cases of COVID throughout the period analyzed. Initially this may seem counter-intuitive, however it is more likely to be a reflection of how effectively the countries reported the data.
This model also suffers from missingness, although not to the same extent as the initial model, which suggests that there are reliability issues around how well the GDP and Hospital Beds were reported. As we can see from the aggregation of missingness analysis, the levels of missingness in Hospital Beds and GDP is relatively high.
The modification indices of this model are very low, including the EPC column, which shows how much the model could change if the correlations were included. From the modification indices, we can conclude that the the effect of the rhs on the lhs is insignificant and would not have a great effect on the model if it were to be included.
In summary, we have identified 3 Hypothesis relating to the effectiveness of the vaccines and proven Hypothesis 3, that increases in vaccinations in the population will not negatively effect the number of cases. Furthermore, we identified a model that looked to incorporate severity of COVID and number of deaths in to the equation to understand the vaccines effect of this. However, this study was hampered by missingness within the data.
An opportunity for further research would be to collect a more complete data set to further analyze the effectiveness of the vaccines on severity of COVID and deaths.